CCReSD: concept-based categorisation of Hidden Web databases
نویسندگان
چکیده
Hidden Web databases dynamically generate results in response to users’ queries. The categorisation of such databases into a category scheme has been widely employed in information searches. We present a Concept-based Categorisation over Refined Sampled Documents (CCReSD) approach that effectively handles information extraction, summarisation and categorisation of such databases. CCReSD detects and extracts query-related information from sampled documents of databases. It generates terms and frequencies to summarise database contents. It also generates descriptions of concepts from their coverage and specificity given in a category scheme. We conduct experiments to evaluate our approach and to show that it assigns databases with more relevant subject categories.
منابع مشابه
Automatic Hidden Web Database Classification
In this paper, a method for automatic classification of Hidden-Web databases is addressed. In our approach, the classification tree for Hidden Web databases is constructed by tailoring the well accepted classification tree of DMOZ Directory. Then the feature for each class is extracted from randomly selected Web documents in the corresponding category. For each Web database, query terms are sel...
متن کاملSources Selection Methodology for Hidden Web Data Integration
In the internet-scale hidden web data integration, The problem of sources(web databases) selection has been a primary challenge. This paper proposes a novel approach for web databases selection of internet-scale hidden web data integration. This approach is based on a benefit function that evaluates how much benefit the web database brings to a given status of integration system by integrating ...
متن کاملSampling, information extraction and summarisation of Hidden Web databases
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users’ queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from database...
متن کاملDiscover Aggregates Exceptions over Hidden Web Databases
Nowadays, many web databases “hidden” behind their restrictive search interfaces (e.g., Amazon, eBay) contain rich and valuable information that is of significant interests to various third parties. Recent studies have demonstrated the possibility of estimating/tracking certain aggregate queries over dynamic hidden web databases. Nonetheless, tracking all possible aggregate query answers to rep...
متن کاملAggregates Disclosure in Hidden Web Databases: an Urgent Challenge
Hidden web databases are widely prevalent on the Internet. Security issues specific to hidden databases, however, have been largely overlooked by the research community, possibly due to the (false) sense of security provided by the restrictive access (i.e., web interface) to such databases. We argue that an urgent challenge facing today’s hidden databases is the disclosure of sensitive aggregat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJHPCN
دوره 5 شماره
صفحات -
تاریخ انتشار 2007